ANDERSON_KSAMP
Overview
The ANDERSON_KSAMP function performs the k-sample Anderson-Darling test, a non-parametric statistical test that determines whether two or more samples are drawn from the same population distribution. Unlike parametric tests, it does not require specifying the underlying distribution, making it particularly versatile for exploratory data analysis.
The Anderson-Darling test is an extension of the classic one-sample Anderson-Darling goodness-of-fit test, adapted for comparing multiple samples. It tests the null hypothesis that all k samples originate from a common, unspecified distribution. If the test statistic exceeds a critical value (or the p-value falls below the significance level), the null hypothesis is rejected, suggesting the samples come from different distributions.
This implementation uses SciPy’s anderson_ksamp function from the scipy.stats module. The test is based on the methodology described by Scholz and Stephens (1987) in their paper “K-Sample Anderson-Darling Tests” published in the Journal of the American Statistical Association.
The function returns a normalized test statistic along with critical values corresponding to significance levels of 25%, 10%, 5%, 2.5%, 1%, 0.5%, and 0.1%. The p-value is interpolated from tabulated values and is floored at 0.1% and capped at 25%. To interpret the results, compare the test statistic against the critical values: if the statistic exceeds the critical value for a given significance level, the null hypothesis can be rejected at that level.
The midrank parameter controls which variant of the test is applied. When set to TRUE (the default), the midrank empirical distribution function is used, which is appropriate for both continuous and discrete data. When set to FALSE, the right-side empirical distribution is used, which is designed specifically for discrete data where ties may occur between samples.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=ANDERSON_KSAMP(samples, midrank)
samples(list[list], required): Table where each column is a sample group, and each row is an observation. Must have at least two columns and two rows per column.midrank(bool, optional, default: true): If TRUE, uses the midrank test (recommended for continuous and discrete data). If FALSE, uses the right side empirical distribution for discrete data.
Returns (list[list]): 2D list [[stat, p, critical_values…]], or error string.
Examples
Example 1: Demo case 1
Inputs:
| samples | midrank | ||
|---|---|---|---|
| 1.1 | 2.2 | 3.3 | true |
| 1.2 | 2.1 | 3.4 |
Excel formula:
=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4}, TRUE)
Expected output:
| Result | ||||||||
|---|---|---|---|---|---|---|---|---|
| -0.9399 | 0.25 | 0.325 | 1.226 | 1.961 | 2.718 | 3.752 | 4.592 | 6.546 |
Example 2: Demo case 2
Inputs:
| samples | midrank | ||
|---|---|---|---|
| 1.1 | 2.2 | 3.3 | true |
| 1.2 | 2.1 | 3.4 | |
| 1.3 | 2.3 | 3.1 |
Excel formula:
=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4;1.3,2.3,3.1}, TRUE)
Expected output:
| Result | ||||||||
|---|---|---|---|---|---|---|---|---|
| -1.3062 | 0.25 | 0.4493 | 1.3053 | 1.9434 | 2.577 | 3.4163 | 4.0721 | 5.5642 |
Example 3: Demo case 3
Inputs:
| samples | midrank | ||
|---|---|---|---|
| 1.1 | 2.2 | 3.3 | false |
| 1.2 | 2.1 | 3.4 |
Excel formula:
=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4}, FALSE)
Expected output:
| Result | ||||||||
|---|---|---|---|---|---|---|---|---|
| -0.8673 | 0.25 | 0.325 | 1.226 | 1.961 | 2.718 | 3.752 | 4.592 | 6.546 |
Example 4: Demo case 4
Inputs:
| samples | midrank | ||
|---|---|---|---|
| 1.1 | 2.2 | 3.3 | false |
| 1.2 | 2.1 | 3.4 | |
| 1.3 | 2.3 | 3.1 |
Excel formula:
=ANDERSON_KSAMP({1.1,2.2,3.3;1.2,2.1,3.4;1.3,2.3,3.1}, FALSE)
Expected output:
| Result | ||||||||
|---|---|---|---|---|---|---|---|---|
| -1.2389 | 0.25 | 0.4493 | 1.3053 | 1.9434 | 2.577 | 3.4163 | 4.0721 | 5.5642 |
Python Code
import warnings
from scipy.stats import anderson_ksamp as scipy_anderson_ksamp
def anderson_ksamp(samples, midrank=True):
"""
Performs the k-sample Anderson-Darling test to determine if samples are drawn from the same population.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson_ksamp.html
This example function is provided as-is without any representation of accuracy.
Args:
samples (list[list]): Table where each column is a sample group, and each row is an observation. Must have at least two columns and two rows per column.
midrank (bool, optional): If TRUE, uses the midrank test (recommended for continuous and discrete data). If FALSE, uses the right side empirical distribution for discrete data. Default is True.
Returns:
list[list]: 2D list [[stat, p, critical_values...]], or error string.
"""
# Validate samples
if not isinstance(samples, list) or len(samples) < 2:
return "Invalid input: samples must be a 2D list with at least two columns (sample groups)."
if any(not isinstance(col, list) or len(col) < 2 for col in samples):
return "Invalid input: each sample group must be a list with at least two values."
try:
# Transpose columns to rows for scipy
transposed = [list(col) for col in samples]
# Check for non-numeric values
for group in transposed:
for v in group:
if not isinstance(v, (int, float)):
return "Invalid input: all sample values must be numeric."
except Exception:
return "Invalid input: samples must be a 2D list of floats."
try:
with warnings.catch_warnings():
warnings.filterwarnings('ignore', message='p-value capped')
result = scipy_anderson_ksamp(transposed, midrank=midrank)
except Exception as e:
return f"scipy.stats.anderson_ksamp error: {e}"
# Compose output row
output = [
float(result.statistic),
float(result.pvalue),
float(result.critical_values[0]),
float(result.critical_values[1]),
float(result.critical_values[2]),
float(result.critical_values[3]),
float(result.critical_values[4]),
float(result.critical_values[5]),
float(result.critical_values[6])
]
# Check for nan/inf
if any([
isinstance(x, float) and (x != x or x == float('inf') or x == float('-inf'))
for x in output
]):
return "Invalid output: statistic or critical values are not finite."
return [output]